fix: Summit cold start from checkpoint deadlock by HenryMBaldwin · Pull Request #131 · SeismicSystems/summit

HenryMBaldwin · 2026-02-26T13:10:50Z

Issue

When starting a wiped node from a summit checkpoint where the execution client has clean state, summit deadlocks (irrecoverably AFAIK) because it treats SYNCING from the execution client as a failure.

Solution

This is solved with an initial syncing phase, prompted by sending an initial forkchoice update to the execution client to set the sync target, and polling until it's done syncing.

We also attempt to sync in execute block if exec client returns syncing. This situation means something has likely gone catastrophically wrong, and should never happen with our reth client, so we log an error instead of a warn while attempting to wait for exec client to recover.

Notably, if the execution client doesn't have any peers or is unable to sync for any reason, this is also effectively stalls the node. IMO this is still a strict improvement over previous behavior.

…forkchoice if syncing is required

daltoncoder

Nice looks good just some minor comments

daltoncoder · 2026-03-02T20:19:23Z

finalizer/src/actor.rs

+                    } else {
+                        warn!(
+                            ?status,
+                            "unexpected response to initial forkchoice update, proceeding anyway"
+                        );
+                        break;
+                    }


The only other status not covered in the other branches is Invalid

/// INVALID is returned by the engine API in the following calls: /// - forkchoiceUpdate: if the new head is unknown, pre-merge, or reorg to it fails

Since this is on startup this is going to be unrecoverable for a node and will crash as soon as he gets a block. Lets just panic here now with a helpful message something like "Finalizer started with invalid forkchoice"

daltoncoder · 2026-03-02T20:51:14Z

finalizer/src/actor.rs

+                height = block.height(),
+                "execution client returned SYNCING, sending forkchoice update to trigger sync and retrying..."
+            );
+            engine_client.commit_hash(state.forkchoice).await;


We discussed this in the office but this commit is not needed and we should remove

daltoncoder · 2026-03-02T20:51:53Z

finalizer/src/actor.rs

+        (true, false) => {
+            warn!(
+                new_height,
+                "payload valid but parent hash mismatch, not executing"
+            );
+        }


As far as i can tell this branch is unreachable. Waiting for @matthias-wright to also check this out but if that is the case this whole thing can be simplified to just check payload_status.is_valid()

daltoncoder · 2026-03-02T20:52:49Z

node/src/test_harness/mock_engine_client.rs

+    // Response override queues
+    check_payload_overrides: VecDeque<PayloadStatus>,
+    commit_hash_overrides: VecDeque<ForkchoiceUpdated>,


Cool nice solution to fixing up the MockEngineClient

daltoncoder · 2026-03-02T20:53:57Z

node/src/engine.rs

 pub const BLOCKS_PER_EPOCH: u64 = 10;
 #[cfg(all(not(debug_assertions), not(feature = "e2e")))]
-const BLOCKS_PER_EPOCH: u64 = 10000;
+const BLOCKS_PER_EPOCH: u64 = 50;


lets revert this until the PR that adds this to Genesis file. It will break some test binaries we have

HenryMBaldwin added 10 commits February 25, 2026 17:57

Temporarily set blocks per epoch to 50

81ce655

Ignore local nvim config.

69add40

fix: Address syncing response from execution client and send initial …

218afcc

…forkchoice if syncing is required

Move sync to startup to avoid deadlocking the finalizer.

b051f9b

Cargo fmt.

2692454

impl new functions in mocks.

d8c0b73

Add retry on syncing in execute_block

ae3767e

Add syncing orchestration for testing.

24b57ed

Add syncing tests.

a5575de

Cargo fmt.

50e3cda

HenryMBaldwin requested a review from daltoncoder March 2, 2026 17:22

daltoncoder requested changes Mar 2, 2026

View reviewed changes

HenryMBaldwin added 4 commits March 2, 2026 17:07

Remove temp epoch reduction.

2538dd2

Remove commit hash in sync loop.

b17a3d9

Panic if initial sync loop recieves invalid response.

ad5adcd

Revert validity check from match stament and simplify conditional.

3d5105a

HenryMBaldwin merged commit a9feea4 into h/staking-and-joining Mar 3, 2026

HenryMBaldwin deleted the h/fix-summit-checkpoint-deadlock branch March 3, 2026 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Summit cold start from checkpoint deadlock#131

fix: Summit cold start from checkpoint deadlock#131
HenryMBaldwin merged 14 commits intoh/staking-and-joiningfrom
h/fix-summit-checkpoint-deadlock

HenryMBaldwin commented Feb 26, 2026 •

edited

Loading

Uh oh!

daltoncoder left a comment

Uh oh!

daltoncoder Mar 2, 2026

Uh oh!

HenryMBaldwin Mar 2, 2026

Uh oh!

daltoncoder Mar 2, 2026

Uh oh!

HenryMBaldwin Mar 2, 2026

Uh oh!

daltoncoder Mar 2, 2026

Uh oh!

daltoncoder Mar 2, 2026

Uh oh!

daltoncoder Mar 2, 2026

Uh oh!

HenryMBaldwin Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HenryMBaldwin commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Solution

Uh oh!

daltoncoder left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HenryMBaldwin commented Feb 26, 2026 •

edited

Loading